11. Implementation
Implementation: Sarsamax
The pseudocode for Sarsamax (or Q-learning) can be found below.

Sarsamax is guaranteed to converge under the same conditions that guarantee convergence of Sarsa.
Please use the next concept to complete Part 3: TD Control: Q-learning of Temporal_Difference.ipynb
. Remember to save your work!
If you'd like to reference the pseudocode while working on the notebook, you are encouraged to open this sheet in a new window.
Feel free to check your solution by looking at the corresponding section in Temporal_Difference_Solution.ipynb
.